A controversial topic (Gun Violence)
Provide an edge of the problems, a tip of the iceberg.
Raise your insights about the problems, and then you can draw conclusions for yourself in order to protect you and your family.
Recognize the primitive signs, not to live in skeptics but to enhance your awareness.
“If you look at the number of Americans killed since 9/11 by terrorism, it’s less than 100. If you look at the number been killed by gun violence, it’s in the tens of thousands.”
Words on my data set
164 observations and 26 variables from 1960 to 2023. (It was 143 observations and 25 variables originally from 1982 to 2022)
My data set’s limitation:
https://www.motherjones.com/politics/2012/12/mass-shootings-mother-jones-full-data/
The data set’s authors are MARK FOLLMAN, GAVIN ARONSEN, and DEANNA PAN.
The time span they collect is from 1982 to present, which
included originally 143 observations and 25 variables. This
is my work’s mainframe.
No specific author’s names.
The time span is from 1960 to 2022. This is the supplemental data for my main data frame.
my_data <- read.csv('Mother Jones - Mass Shootings Database, 1982 - 2023 - Sheet1.csv', na.strings = "-")
my_data2 <- read.csv('Violence Project Mass Shooter Database - Version 6.1 - Full Database.csv', na.strings = "-")
both in form of csv file.
First set: familiar with what we have been learned so far.
Second set: highly complicated
Missing data:
NA (Not Available) - Missing value
Unclear - The information has not been revealed by the authority.
TBD (To Be Determined) - The information has not been confirm yet.
glimpse(my_data)
## Rows: 164
## Columns: 26
## $ case <chr> "Louisville bank shooting", "Nashvill…
## $ city <chr> "Louisville", "Nashville", "East Lans…
## $ state <chr> "KY", "TN", "MI", "CA", "CA", "VA", "…
## $ date <chr> "4/10/2023", "3/27/2023", "2/13/2023"…
## $ summary <chr> "Connor Sturgeon, 25, opened fire ins…
## $ fatalities <int> 5, 6, 3, 7, 11, 6, 5, 3, 5, 3, 7, 3, …
## $ injured <int> 8, 6, 5, 1, 10, 6, 25, 2, 2, 2, 46, 0…
## $ total_victims <int> 13, 12, 8, 8, 21, 12, 30, 5, 7, 5, 53…
## $ location <chr> "workplace", "School", "School", "wor…
## $ age_of_shooter <int> 25, 28, 43, 67, 72, 31, 22, 22, 15, 2…
## $ prior_signs_mental_health_issues <chr> "Yes", NA, NA, NA, "Yes", NA, "Yes", …
## $ mental_health_details <chr> NA, NA, NA, NA, "According to the LA …
## $ weapons_obtained_legally <chr> "Yes", "Yes", "Yes", NA, NA, NA, NA, …
## $ where_obtained <chr> "gun dealership in Louisville", NA, N…
## $ weapon_type <chr> "Semiautomatic Rifle", "One Semiautom…
## $ weapon_details <chr> "AR-15 rifle", NA, NA, NA, NA, NA, NA…
## $ race <chr> "White", "White", "Black", "Asian", "…
## $ gender <chr> "M", "F (\"identifies as transgender\…
## $ sources <chr> "https://apnews.com/article/downtown-…
## $ mental_health_sources <chr> NA, NA, NA, NA, "https://www.latimes.…
## $ sources_additional_age <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ latitude <dbl> NA, NA, NA, NA, NA, 36.77262, 38.8809…
## $ longitude <dbl> NA, NA, NA, NA, NA, -76.25128, -104.7…
## $ type <chr> "Mass", "Mass", "Mass", "Spree", "Mas…
## $ year <int> 2023, 2023, 2023, 2023, 2023, 2022, 2…
## $ day_of_week <chr> "Monday", "Monday", "Monday", "Monday…
Case: Case’s name well-known by the media
City/State: Location where the incidents happened
Date/Year/Date of Week: Specific day and year when the incidents occurred
Summary: Summary about the case
Fatalities/Injured/Total Victims: Facts that stand out from the incident
Location: Type of location where the incidents occurred
Age of Shooter/Race/Gender/Prior Sign Mental Issues: The shooter profile
Weapons Obtained Legally/Where Obtained/Weapons Type/Weapons Details: Weapons profile
Sources/Mental Health Sources/Additional Age Source: All sources that were used to conduct this data set.
Longitude/Latitude: GPS coordination of the incident’s location
Type: Mass Shooting or Shooting Spree designated to the incident
The tidyverse.
Data wrangling and plots.
#install.packages(tidyverse)
library(dplyr)
library(tidyverse)
All the data type was character, even with numeric variables’ types
There is a limitation in term of data information about the
observations of location variable in the raw data. Most of
the Nightlife observations like Bar/Club/Restaurants was classified as
Other, which make the original data having a large quantity
of observations in Other category.
# Assign the variables to the data type of my choice.
my_data$age_of_shooter <- as.integer(my_data$age_of_shooter)
my_data$fatalities <- as.integer(my_data$fatalities)
my_data$injured <- as.integer(my_data$injured)
my_data$total_victims <- as.integer(my_data$total_victims)
my_data$latitude <- as.numeric(my_data$latitude)
my_data$longitude <- as.numeric(my_data$longitude)
# remove newline
my_data$location <- str_replace_all(my_data$location,'[\r\n]','')
my_data$race <- str_replace_all(my_data$race,'[\r\n]','')
# replace a string by another one
my_data$location[my_data$location == 'religious' | my_data$location == 'Religious'] <- 'Religious Place'
my_data$location[my_data$location == 'workplace'] <- 'Workplace'
my_data$race[my_data$race == 'unclear'] <- 'Unclear'
my_data$race[my_data$race == 'black'] <- 'Black'
my_data$race[my_data$race == 'white'] <- 'White'
my_data$gender[2] <- 'Trans'
my_data$gender[my_data$gender == 'M'] <- 'Male'
my_data$gender[my_data$gender == 'F'] <- 'Female'
The FBI defines
a mass shootingas any incidents in which at least four people are murdered with a gun.
Will the time frame would say anything about the incidents
in general?
Would the age, race, and gender give any insights
about the shooter’s profile?
What would stand out if we cross
the shooter with prior mental health issues out of the
equation?
Where are the locations that the incidents likely
take place?
What types of weapons the assaiants likely
use?
What conclusion about the age of shooter,
race and prior mental health issues could we
draw?
What is interesting about the connection between
age of the shooter over year?
Will gender play any roles in corresponding to
age of the shooter?
How have the incidents distributed
across the America?
## age_of_shooter fatalities injured total_victims
## Min. :11.0 Min. : 3.0 Min. : 0.00 Min. : 3.00
## 1st Qu.:23.0 1st Qu.: 4.0 1st Qu.: 1.00 1st Qu.: 6.00
## Median :32.0 Median : 6.0 Median : 3.00 Median : 10.00
## Mean :33.9 Mean : 7.5 Mean : 10.56 Mean : 18.09
## 3rd Qu.:43.0 3rd Qu.: 8.0 3rd Qu.: 9.50 3rd Qu.: 16.00
## Max. :72.0 Max. :58.0 Max. :546.00 Max. :604.00
## NA's :1 NA's :1
1. Las Vegas Strip Massacre: 604 victims
2. LA Dance Studio Mass Shooting: Oldest age for a mass shooter
3. West Middle School Killings: Youngest age
Since 2002 is the year without any major incidents about the mass shooting, I chose it as a reference point for my split stats.
Before 2002, the story seems to be about some certain races, but after 2002, it becomes all the races’ story.
Remember there are four decades before 2002, and only two decades after 2002, but the cases after 2002 shoot up more than double before 2002, 55 versus 109 respectively.
| Race | Average Age of the Shooters |
|---|---|
| White | ~ 28-29 years old |
| Latino | ~ 32-33 years old |
| Black | ~ 38-39 years old |
| Asian | ~ 41 years old |
| Native Am. | ~ 18 years old |
| Gender | Percentage |
|---|---|
| Male | 97% |
| Female | 2.5% |
| Transgender | 0.5% |
| Gender | Most Likely |
|---|---|
| Male | Early 20’s to Mid 40’s |
| Female | Mid 20’s or Mid 40’s |
| Location | Frequency |
|---|---|
| Workplace | ~ 34% |
| School | ~ 16% |
| Bar/Club/Rest. | ~ 11% |
| Retail | ~ 10% |
| Other | ~ 9% |
| Religious Place | ~ 6% |
It is heartbroken to see School is second rank on the list, which means a lot of innocent kids got their future ahead taken.
| Firearm | Percent of Carrying |
|---|---|
| Semi-Auto Handgun | ~ 41% |
| Semi-Auto Rifle | ~ 20% |
| Handgun(Old Versions) | ~ 6.7% |
| Rifle(Old Version) | ~ 5% |
| Assault Rifle | ~ 5% |
| Shotgun | ~ 4% |
| Race | Prior Mental Health Issues |
|---|---|
| Asian | 90% |
| White | 69% |
| Latino | 67% |
| Black | 43% |
Now, the shooters with the signs of the prior mental issues will be added to the graph.
The incidents in which the shooter had prior mental health issues have plotted as the plus sign (+) on the plot above.
Now we take those cases out of the plot to see how the current plot looks like.
Compare to the original plot, we can see intuitively the dots’ density was reduced significantly. Hence, we are going to find the difference between with and without prior mental health issues by numbers.
We can draw some remarks by spotting the plots. Now we are going to
consider some numbers from the data to see how much the
mental health issues contribute to the problem.
my_data %>%
filter(prior_signs_mental_health_issues == "Yes") %>%
group_by(year) %>%
nrow()
## [1] 80
If we filter out the cases with the prior mental health issues, there are eighty cases was off the chart, which is almost half of cases of mass shooting in the US since 1960.
my_data %>%
filter(weapons_obtained_legally == "No") %>%
group_by(year) %>%
nrow()
## [1] 16
In a different case, I cross off the
legal weapons obtained, only 16 cases was off the chart,
which roughly 10% of all of the cases.
People always argued about either we should do the background check or adjust the law over gun control. Now we can state that background check is more important than gun control, especially background check on mental health issues is crucial. Decreasing the cases down to fifty percent is ideal, but twenty or thirty percent down is sufficient to save many lives.
| Prior Signs of Mental Health Issues | Age of the Shooter |
|---|---|
| Yes | around 23 and 40 |
| No | around 30 |
The graph shows us an idea that the incidents most likely occurs over the East and West side of the country, and the Mid-west is least likely to happen the mass shootings.
Colorado is the state in top 5 rating of mass shooting even the population rank is not in top 20 nationwide.
Massachusetts surprisingly has no records on mass shooting even the population is in top 16 nationwide.
State Population Source: https://www.statsamerica.org/sip/rank_list.aspx?rank_label=pop1
Sunday is the deadliest day of the week in term of Mass Shooting but Monday is the most likely day for the Mass Shooter plan to act.